7 research outputs found

    How degenerate is the parametrization of neural networks with the ReLU activation function?

    Full text link
    Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the main loss function generally only depends on the realization of the neural network, i.e. the function it computes. Studying the optimization problem over the space of realizations opens up new ways to understand neural network training. In particular, usual loss functions like mean squared error and categorical cross entropy are convex on spaces of neural network realizations, which themselves are non-convex. Approximation capabilities of neural networks can be used to deal with the latter non-convexity, which allows us to establish that for sufficiently large networks local minima of a regularized optimization problem on the realization space are almost optimal. Note, however, that each realization has many different, possibly degenerate, parametrizations. In particular, a local minimum in the parametrization space needs not correspond to a local minimum in the realization space. To establish such a connection, inverse stability of the realization map is required, meaning that proximity of realizations must imply proximity of corresponding parametrizations. We present pathologies which prevent inverse stability in general, and, for shallow networks, proceed to establish a restricted space of parametrizations on which we have inverse stability w.r.t. to a Sobolev norm. Furthermore, we show that by optimizing over such restricted sets, it is still possible to learn any function which can be learned by optimization over unrestricted sets.Comment: Accepted at NeurIPS 201

    DNN Expression Rate Analysis of High-Dimensional PDEs: Application to Option Pricing

    No full text
    We analyze approximation rates by deep ReLU networks of a class of multivariate solutions of Kolmogorov equations which arise in option pricing. Key technical devices are deep ReLU architectures capable of efficiently approximating tensor products. Combining this with results concerning the approximation of well-behaved (i.e., fulfilling some smoothness properties) univariate functions, this provides insights into rates of deep ReLU approximation of multivariate functions with tensor structures. We apply this in particular to the model problem given by the price of a European maximum option on a basket of d assets within the Black-Scholes model for European maximum option pricing. We prove that the solution to the d-variate option pricing problem can be approximated up to an epsilon-error by a deep ReLU network with depth O(ln(d) ln(epsilon(-1)) + ln(d)(2)) and O(d(2+1/n) epsilon(-1/n)) nonzero weights, where n is an element of N is arbitrary (with the constant implied in O(center dot) depending on n). The techniques developed in the constructive proof are of independent interest in the analysis of the expressive power of deep neural networks for solution manifolds of PDEs in high dimension.ISSN:0176-4276ISSN:1432-094

    Deep Neural Network Approximation Theory

    No full text
    This paper develops fundamental limits of deep neural network learning by characterizing what is possible if no constraints are imposed on the learning algorithm and on the amount of training data. Concretely, we consider Kolmogorov-optimal approximation through deep neural networks with the guiding theme being a relation between the complexity of the function (class) to be approximated and the complexity of the approximating network in terms of connectivity and memory requirements for storing the network topology and the associated quantized weights. The theory we develop establishes that deep networks are Kolmogorov-optimal approximants for markedly different function classes, such as unit balls in Besov spaces and modulation spaces. In addition, deep networks provide exponential approximation accuracy—i.e., the approximation error decays exponentially in the number of nonzero weights in the network—of the multiplication operation, polynomials, sinusoidal functions, and certain smooth functions. Moreover, this holds true even for one-dimensional oscillatory textures and the Weierstrass function—a fractal function, neither of which has previously known methods achieving exponential approximation accuracy. We also show that in the approximation of sufficiently smooth functions finite-width deep networks require strictly smaller connectivity than finite-depth wide networks.ISSN:0018-9448ISSN:1557-965
    corecore